NBA EDA Project

The EDA project in this course has four main parts to it:
1. Project Proposal 2. Phase 1 3. Phase 2 4. Report This notebook will be used for Project Proposal, Phase 1, and Phase 2. You will have specific questions to answer within this notebook for Project Proposal and Phase 1. You will also continue using this notebook for Phase 2. However, guidance and expectations can be found on Canvas for that assignment. The report is completed outside of this notebook (delivered as a PDF). Detailed instructions for that assignment are provided in Canvas.
Read this before proceeding: 1. Review the list of data sets and sources of data to avoid before choosing your data. This list is provided in the instructions for the Project Proposal assignment in Canvas.

2. It is expected that when you are asked questions requiring typed explanations you are to use a markdown cell to type your answers neatly. Do not provide typed answers to questions as extra comments within your code. Only provide comments within your code as you normally would, i.e. as needed to explain or remind yourself what each part of the code is doing.

Project Proposal

The intent of this assignment is for you to share your chosen data file(s) with your instructor and provide general information on your goals for the EDA project.
Step 1 (2 pts): Give a brief description of the source(s) of your data and include a direct link to your data.

I am using data can be found at https://www.basketball-reference.com/leagues/NBA_2020_totals.html

The data contains individual players statistics in the NBA. It has 30 different attributes to measure overall players statistics for all the regular NBA seasons from the year 2015 to 2020.

Step 2 (2 pts): Briefly explain why you chose this data.

I have been watching NBA Basketball games since childhood. A huge fan of Chicago Bulls team, which led to interest of choosing the overall NBA Players Statistics data. My initial intuition is that in past few years NBA players have been scoring more points in 3-pointers shooting categories and are more likely to be in teams that make the NBA playoffs and win the Larry O'Brien Championship Trophy.

Step 3 (1 pt): Provide a brief overview of your goals for this project.

The goal of this project is to confirm my hunch is either correct or incorrect regarding shooting 3-pointers is a more optimal solution compared to 2-pointers. I suspect that based on my initial intuition NBA players scoring more in 3-pointers offensive categories are more likely to lead points total and be in teams that make NBA playoffs. To answer my hypothetical question we need to analyze the data from the 2015-2020 NBA seasons, and further research into other attributes of the player's statistics such as Position, Team, Games, Minutes Played, and Field Goal Percentages. As the most current NBA Player Statistics Data does not contain the Playoffs information I will be creating reference data of teams that reached the playoffs each year to merge with the current dataset.

Step 4 (1 pt): Read the data into this notebook.
Step 5 (1 pt): Inspect the data using the info( ), head( ), and tail( ) methods.
STOP HERE for your Project Proposal assignment. Submit your (1) original data file(s) along with (2) the completed notebook up to this point, and (3) the html file for grading and approval.
Instructor Feedback and Approval (3 pts): Your instructor will provide feedback in either the cell below this or via Canvas. You can expect one of the following point values for this portion. 3 pts - if your project goals and data set are both approved.
2 pts - if your data set is approved but changes to your project goals (Step 3) are needed.
1 pt - if your project goals are approved but your data set is not approved.
0 pts - if neither your data set nor your project goals are approved.

As needed, follow your instructor's feeback and guidance to get on track for the remaining portions of the EDA project.

EDA Phase 1

The overall goal of this assignment is to take all necessary steps to inspect the quality of your data and prepare the data according to your needs. For information and resources on the process of Exploratory Data Analysis (EDA), you should explore the EDA Project Resources Module in Canvas. Once you’ve read through the information provided in that module and have a comfortable understanding of EDA using Python, complete steps 6 through 10 listed below to satisfy the requirements for your EDA Phase 1 assignment. **Remember to convert code cells provided to markdown cells for any typed responses to questions.**
Step 6 (2 pts): Begin by elaborating in more detail from the previous assignment on why you chose this data?
1. Explain what you hope to learn from this data. 2. Do you have a hunch about what this data will reveal? (The answer to this question will be used in the Introduction section of your EDA report.)

I hope to learn and find positive or negative correlation between how NBA players average scoring points in the 3 pointers and 2 pointers shooting category can affect their Teams chances of making the NBA playoffs. Also, explore and identify relationship between NBA player positions that can be used to analyze the data in the shooting category.

I have a hunch this data will reveal that NBA players shooting an average of higher 3 pointers per game have good offensive stats in categories like '3Pointers', 'FieldsPerGame', and 'TotalPointsPerGame' are typically more likely to score more to lead in 'TotalPoints' and be in 'Team' that makes 'Playoff' in a season than players who score an average of 2 pointers per game. Also, based on the NBA player's position for Shooting Guard (SG) and Point Guard (PG) are more likely to score more 3 pointers per games and leads in total points per season compare to position such Center(C), Small-Forwards (SF), and Power-Forwards (PF) are likely to score more 2 pointers per game.

Step 7 (2 pts): Discuss the popluation and the sample:
1. What is the population being represented by the data you’ve chosen? 2. What is the total sample size?

The population being represented is overall NBA Player statistics 2015-2020 also reference data to check which players are in the Teams that made the playoffs during the following seasons.

The sample size is 3196 rows and 37 columns which contains NBA players statistics and playoffs reference data from 2015-2020

Step 8 (2 pts): Describe how the data was collected. For example, is this a random sample? Are sampling weights used with the data?

The data was collected from the following website https://www.basketball-reference.com/ which is publicly available for variety of NBA stats, and our data focuses on overall NBA Players Statistics from the year 2015-2020 seasons which is downloaded from this site https://www.basketball-reference.com/leagues/NBA_2020_totals.html

This is not random sample and sampling weights are not used with the data

Step 9 (4 pts): In the Project Proposal assignment you used the info( ) method to inspect the variables, their data types, and the number of non-null values. Using that information as a guide, provide definitions of each of your variables and their corresponding data types, i.e. a data dictionary. Also indicate which variables will be used for your purposes.
Variables Definition DataType Will be Used
1 Rk Rank of the players for each season based on overall statistics Integer
2 Player Player corresponds to name of the NBA players for all each season String X
3 Pos Pos is the position of the NBA players while playing the game String X
4 Age Age corresponds to age of the NBA players String X
5 Tm Tm corresponds to team of the NBA players String X
6 G G corresponds to the number of games played by NBA player in that season Integer X
7 GS GS corresponds to the number of games started by NBA players in that season Integer
8 MP MP corresponds to the number of minutes played by NBA players in that season Integer X
9 FG The number of field goals that a NBA players have made. This includes both 2 pointers and 3 pointers Integer X
10 FGA The number of field goals that a NBA players have attempted. This includes both 2 pointers and 3 pointers Integer X
11 FG% The percentage of field goal attempts that a NBA player makes in that season Float X
12 3P The number of 3 pointers field goals that a NBA players have made in that season Integer X
13 3PA The number of 3 pointers field goals that a NBA players have attempted in that season Integer X
14 3P% The percentage of 3 pointers field goal attempts that a NBA player makes in that season Float X
15 2P The number of 2 pointers field goals that a NBA players have made in that season Integer X
16 2PA The number of 2 pointers field goals that a NBA players have attempted in that season Integer X
17 2P% The percentage of 2 pointers field goal attempts that a NBA player makes in that season Float X
18 eFG% It is effective field goal percentage that adjusts for the fact that a 3-pointer field goal is worth one more point than a 2-pointer field goal Float X
19 FT The number of free throws that a NBA players have made in that season Integer
20 FTA The number of free throws that a NBA players have attempted in that season Integer
21 FT% The percentage of free throw attempts that a NBA player makes in that season Float
22 ORB The number of offensive rebounds an NBA player has collected while they were playing on offense in that season Integer
23 DRB The number of defensive rebounds an NBA player has collected while they were playing on defense in that season Integer
24 TRB The number of total rebounds an NBA player has collected while they were playing in that season Integer
25 AST The number of assists is a pass made to another player that lead directly to a basket point Integer
26 STL The number of times an NBA defensive player takes the ball from a player on offense, while playing game in that season Integer
27 BLK A block occurs when offensive NBA player attempts a shot, and the defense player tips the ball, blocking their chance to score a point Integer
28 TOV A turnover occurs when the NBA player on offense loses the ball to the defense data is collected for each NBA player in that season Integer
29 PF The number of personal fouls an NBA player has committed in that season Integer
30 PTS The number of points scored by an NBA player in that season Integer X
31 Year The Year is reference data column to keep track of NBA players statistics from each season String X
32 Playoff The Playoff is reference data column to keep track of which NBA Players were in the team made playoffs in corresponding seasons String X
33 MinutesPerGame The MinutesPerGame is calculated reference based on the 'MinutesPlayed' divided by 'GamesPlayed' column because earlier these columns have stats per seasons. To have better analysis and interpretation of the data. Float X
34 FieldGoalsPerGame The FieldGoalsPerGame is calculated reference based on the 'FieldGoals' divided by 'GamesPlayed' column because earlier these columns have stats per seasons. To have better analysis and interpretation of the data we have created Field Goals per game. Float X
35 3PointerPerGame The 3PointerPerGame is calculated reference based on the '3Pointers' divided by 'GamesPlayed' column because earlier these columns have stats per seasons. To have better analysis and interpretation of the data. Float X
36 2PointerPerGame The 2PointerPerGame is calculated reference based on the '2Pointers' divided by 'GamesPlayed' column because earlier these columns have stats per seasons. To have better analysis and interpretation of the data. Float X
37 TotalPointsPerGame The TotalPointsPerGame is calculated reference based on the 'TotalPoints' divided by 'GamesPlayed' column because earlier these columns have stats per seasons. To have better analysis and interpretation of the data. Float X
Step 10 (10 pts): For full credit in this problem you'll want to take all necessary steps to report on the quality of the data and clean the data accordingly. Some things to consider while doing this are listed below. Depending on your data and goals, there may be additional steps needed than those listed here. 1. Are there rows with missing or inconsistent values? If so, eliminate those rows from your data where appropriate. 2. Are there any outliers or duplicate rows? If so, eliminate those rows from your data where appropriate. At each stage of cleaning the data, state how many rows were eliminated. 3. Are you using all columns (variables) in the data? If not, are you eliminating those columns? 4. Consider some type of visual display such as a boxplot to determine any outliers. Do any outliers need removed? If so, how many were removed? At each stage of cleaning the data, state how many rows were eliminated. It is good practice to get the shape of the data before and after each step in cleaning the data and add typed explanations (in separate markdown cells) of the steps taken to clean the data.
Include the rest of your work below and insert cells where needed.

The first step is adding year column to all the corresponding NBA statistics seasons dataframe. Second step is to concatenate all the NBA seasons dataframe into one large dataset assigning to NBA Players Statistics dataframe. Third step is merging nba_player_stats dataframe we made in second step above with nba_playoffs_data dataframe doing outer join on Year and Team column, so that after merge we can analyze which Teams made the playoffs in what Year, assign the new joined table to overall_nba_playoffs_stats data.

We are going concatenate all the NBA seaons dataframe where we added Year column into one large dataset assigning to NBA Players Statistics dataframe represents the following variable nba_players_stats mentioned below.

Step 1: We are merging nba_player_stats dataframe which is the concatenated dataframe with nba_playoffs_data reference dataframe which includes Year, Team, and Playoffs column shows 'Y' because all the Teams in the playoffs reference made the playoffs

Step 2: We did an outer join on Year and Team column, so that returns all the rows from the left dataframe, all the rows from the right dataframe, and matches up based on the Year, Team and Playoffs that represents 'Y' who made the playoffs. Also, with NaNs elsewhere for the Teams that did not make the playoffs

We are checking for the rows in the dataframe below to see which Teams did not make the playoffs, these are NaNs values after the outer join on the dataframe we are making these corresponding NaNs values to Playoff = 'N' which means that these NBA players and Teams did not make the playoffs in that Year based on the historical five year data

The query below shows 1491 NBA players are in the Teams made Playoffs and 1705 NBA players are in the Teams that did not make Playoffs for NBA seasons from year 2015-2020

Variables created that includes all columns that needed to be dropped because they are not relevant to the data analysis

In the new dataframe named overall_nba_playoffs_stats2 that uses the col_to_drop variable to eliminate all irrelevant 13 columns, and they will be dropped from the table

Next step, to show shape of the table only 19 remaining from the 32 columns for further analysis

Rename columns for better understanding and easier interpretation on what each column means. The variable overall_nba_playoffs_stats2 columns used to rename the main dataframe for the remaining 19 columns. Also successfully confirmed that columns are renamed in the main dataframe

I have created 4 extra reference data column with MinutesPerGame, FieldGoalsPerGame, 3PointerPerGame, and 2PointerPerGame to have better understanding and interpretation of the data. Also, would get easier for the viewer to observe Exploratory Data Analysis done on the following columns.

Next step, to show shape of the table only 24 remaining columns for further analysis

We have removed all irrelevant columns, and renamed all the remaining columns for easier interpretation. Now we need to identify all missing values in each row from the dataframe so the data can be properly analyzed. To do this step will be using 'overall_nba_playoffs_stats2.isnull().sum()' which used to check for all missing values in each column.

For the data clean up of missing values we need to take step-by-step approach rather than using 'dropna'function on entire dataframe because based on NBA players position some players may not shoot 3Pointers vice-versa other players may not shoot 2Pointers which affects percentage for following columns above using missing values. We will create variables for all rows in each column that contain missing values, and update dataframe

All rows with missing values in both 'FieldGoals%' and 'EffectiveFieldGoals%' series removed from overall_nba_playoffs_stats2. The missing values showned for the following means that these haven't scored any points in that season. The amount of Games played were really low could be because of injury.

All rows with missing values in '3Pointers%'and '2Pointers%' series are checked from overall_nba_playoffs_stats2 dataframe. As mentioned earlier that based on NBA players position some players may not shoot 3Pointers vice-versa other players may not shoot 2Pointers which affects percentage for following columns above using missing values.These rows with missing values are not dropped. We will replace all missing value 0 with 'overall_play_nba_playoffs_stats2.fillna(0, inplace=True)' functionality.

A boxplot is created for 'GamesPlayed' and 'Age' to reveal any significant outliers.

Explaining Box Plot in GamesPlayed Column: The GamesPlayed column box plot displays the summary of five sets such as the lower whiskers represents minimum games played by NBA player is 1. Similarly, upper whiskers is the most games played by an NBA player is 82 per season. The lower quartile shows 25% of NBA player that played below 20 games. The upper quartile represents 75% of NBA player that played below 68 games. The inter-quartile represents average NBA player games played is 48. There are no significant outliers in the Gamesplayed box plot.

Explaining Box Plot in Age Column: The Age column box plot displays the summary of five sets such as the lower whisker is the youngest NBA player of age 19 in our data. Similarly, upper whiskers is the oldest NBA player's age of 39 in our data. The lower quartile shows 25% of NBA player's age are below the age of 23 in the data. The upper quartile shows 75% of NBA player's age are below the age of 29. The inter-quartile represents average NBA player's age is 26. The outliers in the Age column box plot are plotted as individual dots that are in-line with whiskers for instance we can see that upper extreme for the NBA player's age is 39 beyond that age are outliers which has max age of 43 which will be removed from our data.

shape of dataframe shows 3179 rows before dropping NBA players that have age greater that 39

shape of dataframe shows 3172 rows after dropping NBA players that have age greater that 39

Explaining Box Plot in MinutesPerGame Column: The MinutesPerGame column box plot displays the summary of five sets such as the lower whiskers represents minimum minutes per game played by an NBA player is 0.67 minutes. Similarly, upper whiskers is the maximum minutes per game played by an NBA player is 42 minutes. The lower quartile shows 25% of NBA player that played below 12 minutes 18 seconds per game. The upper quartile represents 75% of NBA player that played below 26 minutes 70 seconds per game. The inter-quartile represents average NBA player minutes per game is 19 minutes 28 seconds. There are no significant outliers in the Gamesplayed box plot.

I was just curious to see which NBA player played lowest minutes per games in the data. I have queried that data below because it did not seem realistics to have such a lowest minutes per game stats in an NBA season

Explain Box Plot in FieldGoalsPerGame Column: The FieldGoalsPerGame column box plot displays the summary of five sets such as the lower whiskers represents minimum fields goals per game by an NBA player is 0 because these include both three and two pointer shots category statistics in this column based on position of an NBA player they shoot only 2 or 3 pointers. Similarly, upper whiskers is the maximum field goals per game by an NBA player is 8. The lower quartile shows 25% of NBA player's field goals per games are below 1. The upper quartile represents 75% of NBA player's field goals per game are below 4. The inter-quartile corresponds to the average NBA Player Fields Goals Per Game is 3. Even though there are outliers according FieldGoalsPerGame box plot I will consider keeping them because as mentioned above FieldGoals plays a big factor based NBA player's position also doing data analysis on my hypothesis

shape of dataframe is still 3172 rows after analyzing the FieldsGoalsPerGame box plot

Explain Box Plot in 3PointerPerGame Column: The 3PointerPerGame column box plot displays the summary of five sets such as the lower whiskers represents minimum 3 pointers per game by an NBA player is 0 because position of an NBA player they shoot only 2 or 3 pointers. Similarly, upper whiskers is the maximum of 5 three pointers per game scored by an NBA player. The lower quartile shows 25% of NBA player's have scored below 0 three pointers per game for instance NBA position such as 'Center' or 'PowerForward mostly do not shoot 3 pointers in a game. The upper quartile represents 75% of NBA player's have scored below 1 three pointers per game. The inter-quartile corresponds to the average NBA Player score 1 three pointer per game in a regular season.

Explain Box Plot in 2PointerPerGame Column: The 2PointerPerGame column box plot displays the summary of five sets such as the lower whiskers represents minimum 2 pointers per game by an NBA player is 0 because position of an NBA player they shoot only 2 or 3 pointers. Similarly, upper whiskers is the maximum of 10 two pointers per game scored by an NBA player. The lower quartile shows 25% of NBA player's have scored below 1 two pointers per game. The upper quartile represents 75% of NBA player's have scored below 3 two pointers per game. The inter-quartile corresponds to the average NBA Player score 2 two pointers per game in a regular season.

Explain Box Plot in TotalPointsPerGame Column: The TotalPointsPerGame column box plot displays the summary of five sets such as the lower whiskers represents minimum total points per game by an NBA player is 0 because position of an NBA player they shoot only 2 or 3 pointers also there is a chance player could be injured. Similarly, upper whiskers is the maximum of 36 total points per game scored by an NBA player. The lower quartile shows 25% of NBA player's have scored below 4 total points per game. The upper quartile represents 75% of NBA player's have scored below 11 total points per game. The inter-quartile corresponds to the average NBA Player score 7 total points per games in a regular season.

To keep in mind that the following is historical data from the year 2015-2020 NBA seasons where the data can fluctuate in these columns mentioned above. Also, even though there are outliers in the columns above we will keep them because these columns above play a vital role in data analysis that shooting 3 pointers is an optimal solution over 2 pointers to make the NBA playoffs.

We are trying to reorder the columns in the table so that it looks organized for better understanding as the reviewer.

Below, I am showing the before reordering process of the columns where all columns that we are newly added are at the end of the table

In this step, we are reordering the newly added columns properly in the table which were added at the end of the table earlier.

In the final step, making sure that shape of the dataframe as the correct amount of rows and columns as expected from the cleanup process. Furthermore made a last check for to make sure there are no nulls values in the columns. There were 3196 rows and 32 columns in the beginning after joining with reference data, then eliminated all 13 irrelevant columns, and added 5 calculated reference columns so that better analysis as well as interpretation of the data. A total of 24 rows were eliminated. After, doing the the cleanup process for phase one 3172 rows and 24 columns will be used for data visualization in the next phase.

STOP HERE for your EDA Phase 1 assignment. Submit your cleaned data file along with the completed notebook up to this point for grading.

EDA Phase 2

All of your work for the EDA Phase 2 assignment will begin below here. Refer to the detailed instructions and expectations for this assignment in Canvas.

NBA Statistics Heatmap Analysis

Explaination of how the visual cues of the heatmap represent the correlations.

TotalPointsPerGame Correlations Analysis:

Based on correlation matrix the dark green color shows high positive correlation between Total points per game and three other columns such as FieldGoals, 2 pointer, and minutes played by an NBA player per game. This relationship definitely makes sense because when we observe an NBA overall Total points per game it depends upon these columns in terms the amount of minutes played by a player, fields goals attempts in 2 pointer or 3 pointer shooting scores category by the player based on that the Most Valuable Player of the Game is awarded. Similarly, light green color show slight positive correlation as the amount of 3 pointers scored by an NBA player might be less based on player's position.

FieldGoalsPerGame Correlations Analysis:

In the Field goals per game correlation matrix as mentioned dark green shows high positive correlation between Field Goals per game and other three columns such as Total points, 2 pointer, and Minutes per game. Similar to analysis of total points the field goals also depends how many 2 pointers player attempted and scored points which gets tallies to the total points, and amount of minutes played is also important for this analysis. The yellow color shows moderate neutral correlation between fields goals and 3 pointers per games because as field goals included both 2 and 3 pointers shooting category there are two possibilities either player might have attempted more 2 pointers per games or player missed a lot of 3 pointers per game during the season.

3PointerPerGame Correlations with TotalPointsPerGame, FieldGoalsPerGame and MinutesPerGame Analysis:

The three pointer per game shows high positive relationship in dark green color with total points and minutes per game because its dependent on how minutes player as played also three pointer shooting score gets tallied to overall total points per game. On the other hand, three pointer per game shows moderate neutral correlation with field goals per game because players may have not been scoring based on the amount of fields goals they have attempted per game.

2PointerPerGame Correlations with TotalPointsPerGame, FieldGoalPerGame and MinutesPerGame Analysis:

The two pointer per game shows high positive relationship in dark green color with total points and fields goals per game because players might have been scoring around range or amount of fields goals they attempted per game. Also, two pointer shooting score gets tallied to overall total points per game. The minutes per game shows medium positive relationship because its dependent on how many minutes player have played which could affect their shooting category either in a positive or negative way.

3PointerPerGame Correlations with 2PointerPerGame Analysis:

The three pointer and two pointer per games shows strong negative correlation in red color as they both are two different shooting category and not dependent on each other in terms of contribution towards player's overall statistics.

Seaborn Box Plot Analysis

Explain Outliers: GamesPlayed, Age, MinutesPerGame and FieldGoalsPerGame

There are no significant outliers in the Gamesplayed box plot. The outliers in the Age column box plot are accurate we can see that upper extreme for the NBA player's age is 38 but beyond that are outliers based on data there are several players who are still in NBA around that age range of 40 and are in the team that make playoffs.There are no significant outliers in the MinutesPerGame box plot. The FieldsPerGame box plot shows accurate outliers because every season produces a group of players who achieve superior offensive statistics scoring around range of 9 to 11 fieldsgoals per game. Also, FieldsPerGame include both three and two pointer shots category statistics in this column based on position of an NBA player they shoot only 2 or 3 pointers.

Explain Outliers: 2PointerPerGame, 3PointerPerGame, and TotalPointsPerGame

The outliers for all the following columns are accurate as represented by box plot. The outliers for 2PointerPerGame lies beyond the upper extreme is around 7 to 10 two pointer per game. Similarly, for 3PointerPerGame box plot indicates NBA players shooting around the range of 3 to 5 three pointers per game are outliers. Finally, TotalPointsPerGame box plot looks kind of accurate based on our data because some players might be injured or have played less minutes per games compare to other which skews the box plot upper extreme of NBA players scoring greater than 22 points per games are outliers.

But the overwhelming majority of players who typically achieve subpar offensive statistics in following shooting category play a vital roles in our analysis because these outliers are NBA players leading in average two pointer, three pointer and total points per game category doing analysis on these outliers would help us in understanding and answering our hypothesis

Swarm Plot Analysis 1: Based on NBA Player's Position

NBA Player Position
1. PF - Power Forwards
2. SG - Shooting Guards
3. SF - Small Forwards
3. C - Center
4. PG - Point Guards

Explaination of Swarm Plot

The swarm plot is more visible than a scatter plot and are effectively categorized like a bar plot. As we are comparing total points per game scored by NBA player compared to three pointer per game with help of swarm we are able to categorize how different NBA player's position have scored more three pointer per game in our historical data. Based on the swarm plot the red dots in the data corresponds NBA Shooting Guard (SG) position it shows that players in this position have an high average scoring range around 2 to 4 three pointer per game which leads them to having an average of 10 to 25 total points per games compare to other positions which have low scoring range from 0 to 1 three pointer per game

The light orange dots in the data corresponds to NBA Point Guard (PG) position which indicates that players in this position have an high average scoring range around 3 to 5 three pointer per game which leads them to having an average of 10 to 35 total points per games shooting three pointer compare to other positions.

It is interesting to note that players who play multiple positions (e.g., PF-C, SF-SG, PG-SG, PF-SF) do not make significant amount of total points per game. This shows that the players who are assigned multiple positions may have other unique responsibilities compared to traditionally NBA positions mentioned above.

Swarm Plot Analysis 2: Based on NBA Player's Position

NBA Player Position
1. PF - Power Forwards
2. SG - Shooting Guards
3. SF - Small Forwards
3. C - Center
4. PG - Point Guards

Explaination of Swarm Plot

Based on the swarm plot the green and light green dots in the data corresponds to NBA Center (C) and Point Guard (PG) position which indicates that players in this position have an high average scoring range around 2 to 7 two pointer per game which leads them to having an average of 5 to 30 total points per games shooting two pointer compare to other positions.

The blue dots in the data corresponds NBA Power Forward (PF) position it shows that players in this position have an high average scoring range around 1 to 6 two pointer per game which leads them to having an average of 5 to 20 total points per games shooting two pointer compare to other positions. Also, it interesting to note that players who play multiple positions (e.g., PF-C, SF-SG, PG-SG, PF-SF) do not make significant amount of total points per game because they might be only be playing multiple positions sometimes during a season, not regularly compare to their normal position.

Pie Chart Analyis: Total 3 Pointer for Overall Season from 2015-2020 Categorized NBA Player Positions

Hover over Data on NBA Player Position
1. PF - Power Forwards
2. SG - Shooting Guards
3. SF - Small Forwards
3. C - Center
4. PG - Point Guards

Explaination of Pie Chart: NBA Player 3 Pointers Statistics Per Season from 2015-2020 Categorized by Positions

To further confirm my hunch either correct or incorrect that the overall data will reveal positive correlation between NBA players in Point Guard (PG) and Shooting Guard (SG) position are likely to score an average of higher three pointers compared to other position. We decided to look at broader data with help of pie chart which represents the overall NBA players 3 pointers statistics per season from year 2015 to 2020. It is sliced into percentages of different NBA positions based on their shootings score per season.

Shooting Guard (SG) and Point Guard (PG) Analysis:
As we can see that 31.6% of the 3 pointers are scored by players in the Shooting Guard (SG) position with the total of 43,471 three pointers from our historical data. Similarly, 22.5% of 3 pointers are scored by NBA players in Point Guard (PG) position with total of 30,924 three pointers for five regular season. It clearly shows that Shooting Guard position have always scored more three pointers per season then any other position in NBA

Small Forwards (SF) and Power Forwards (PF) Analysis:
On the other hand, NBA players playing in Small Forward (SF) have scored 27,797 total three pointers per season,as they cover 20.2% of the data. The pie chart also indicates that 18.3% of 3 pointers per season are scored by players in the Power Forwards position. It is surprising to see there are is not much difference in between Small Forwards, Power Forwards, and Point Guards positions shooting three pointer category. It leads to the fact that NBA player's position doesn't really dependent upon their scoring points style. As shown in the pie chart with three pointer per season data in this case players might trying to learn how to improve in both category over the years which led to us seeing closer percentages.

Center (C) and Multiple Position Analysis:
Center (C) and other multiple positions have combined percentage of only 7.4% in the three pointer per season shooting category for five year historical data which shows they are outliers based on our assumptions they are more likely to attempt more two pointers than three pointers per season. Also, players with multiple positions do not make significant amount of three pointers per season because they might be only be playing multiple positions only sometimes during a season, not regularly compare to their normal position.

Pie Chart Analysis Total 2 Pointer for Overall Season from 2015-2020 Categorized By NBA Player Positions

Hover over Data on NBA Player Position
1. PF - Power Forwards
2. SG - Shooting Guards
3. SF - Small Forwards
3. C - Center
4. PG - Point Guards

Explaination of Pie Chart: NBA Player 2 Pointers Statistics Per Season from 2015-2020 Categorized by Positions

To further confirm my hunch either correct or incorrect that the overall data will reveal positive correlation between NBA players in Power Forward (PF), Small Forward (SF), and Center (C) position are likely to score an average of higher two pointers compared to other position. We decided to look at broader data with help of pie chart which represents the overall NBA players 2 pointers statistics per season from year 2015 to 2020. It is sliced into percentages of different NBA positions based on their shootings score per season.

Center (C) and Point Guard (PG) Analysis:
As we can see that 24.2% of the 2 pointers are scored by players in the Center (C) position with the total of 92,408 two pointers from our historical data. Similarly, 20.3% of 2 pointers are scored by NBA players in Point Guard (PG) position with total of 77,449 two pointers for five regular season. It clearly shows that Center position have always scored more two pointers per season then any other position in NBA

Power Forwards (SF) and Shooting Guards (SG) Analysis:
On the other hand, NBA players playing in Power Forward (PF) have scored 74,804 total two pointers per season,as they cover 19.6% of the data. The pie chart also indicates that 19.6% of 2 pointers per season are scored by players in the Shooting Guard (SG) position. It is surprising to see there is tie between Power Forwards and Shooting Guard position in the shooting two pointer category. It leads to the fact that NBA player's position doesn't really dependent upon their scoring points style. As shown in the pie chart with two pointer per season data in this case players might be trying to learn how to improve in both category over the years which led to us seeing tie in terms of percentages between these two NBA positions.

Small Forwards (SF) and Multiple Position Analysis:
Small Forwards (SF) and other multiple positions have combined percentage of only 16.3% in the two pointer per season shooting category for five year historical data which shows they are outliers based on our hunch was incorrect we thought they are more likely to attempt more two pointers than three pointers per season which shows that there is neutral correlation between NBA player's position and the offensive shooting category. The multiple positions only contributed 0.7% of data which indicates that they do not make significant amount of two pointers per season because they might be only be playing multiple positions only sometimes during a season, not regularly compare to their normal position.

Polar Line Plot Analysis :

Total 3 Pointer Per Game for Overall Season from 2015-2020 Categorized By Players Made Playoffs or Not

Hover over Year Angle for Leading Scorers in 3 Pointer Per Game Category
1. 2015-2016 shows green line
2. 2016-2017 shows green line
3. 2017-2018 shows green and pink line
3. 2018-2019 shows green and pink line
5. 2019-2020 represents orange line

Polar Line Plot Analysis:
The polar line plot will helps us to understand which NBA players is leading in the three pointer per game category from the year 2015 to 2020 based on the player's data line which shows the amount of three pointer they have scored thats touches particular year angle. Also, retrieve information if the NBA player leading in points made to the playoffs or not.

Let's starts with the regular season year 2015-2016 and 2016-2017 as we hover over the green line on these year column angle we can see that Steph Curry is leading in three pointer category for both these years, and have made the playoffs.

Similarly, there is tie between James Harden indicated in the pink line , and Steph Curry data is in green line as they both touch 2017-2018 and 2018-2019 year angle with leading score of 4 three pointer and 5 three pointer per games for those years.

Finally, Damian Lillard indicated with orange line being a leading scorer with 4 three pointer per game for the year 2019-2020. The data reveals that all the 3 NBA players who are the leading points scorers in the three pointer category have made playoffs. To further confirm our assumptions above will have to look at 2 pointer per game category as well, and see this pattern persist.

Total 2 Pointer Per Game for Overall Season from 2015-2020 Categorized By Players Made Playoffs or Not

Hover over Year Angle for Leading Scorers in 2 Pointer Per Game Category
1. 2015-2016 shows red and orange line
2. 2016-2017 shows red line
2. 2017-2018 shows red line
3. 2018-2019 shows red and blue line
5. 2019-2020 represents pink line

Polar Line Plot Analysis:
The polar line plot will helps us to understand which NBA players is leading in the two pointer per game category from the year 2015 to 2020 based on the player's data line which shows the amount of two pointer they have scored thats touches particular year angle. Also, retrieve information if the NBA player leading in points made to the playoffs or not.

Let's starts with the regular season year 2015-2016,2016-2017, 2017-2018, and 2018-2019 as we hover over the red line on these year column angle we can see that Anthony Davisis leading in two pointer category for these four years, but have only made it to the playoffs for once out of 4 years.

In addition, there is a tie between LaMarcus Aldridge indicated in the orange line, and Anthony Davis data is in red line as they both touch 2015-2016 year angle with leading score of 9 two pointer per games for this year but only LaMarcus Aldridge made the playoffs that year, and not Anthony Davis.

Similarly, there is a tie between Giannis Antetokounmpo indicated in the blue line, and Anthony Davis data is in red line as they both touch 2018-2019 year angle with leading score of 9 two pointer per games for this year but only Giannis Antetokounmpo made the playoffs that year, and not Anthony Davis.

Finally, Russell Westbrook indicated with pink line being a leading scorer with 10 two pointer per game for the year 2019-2020. The data reveals that that only 2 out of 3 NBA players who are the leading points scorers in the two pointer category have made playoffs. So, our assumption was wrong the pattern from previous polar plot did not persist, regarding NBA players leading in the shooting category always make the playoffs.

Scatter Plot Analysis: GamesPlayed vs MinutesPerGame from 2015-2020

Hover Over Data For Addition Information
1. Player Name
2. Total Points Scored from 2015-2020
4. Games Played
5. Minutes Per Game
6. N - Did not make Playoffs (Blue Dots)
7. Y - Made it to the Playoffs (Red Dots)

Scatter Plot Analysis:
The main goal of this scatter plot is to analyze if the data indicates any correlation between GamesPlayed and MinutesPerGame. Also, how does it affect players chances of making playoffs or not.

As we look at the trendline in the scatter plot we see that a lot of red dots in the right upper quadrant of the plot where NBA players who have played around range of 75 - 80 games and have an average of greater than 30 minutes per game have more chances of making playoffs, and scoring more points based on the historical data.

On the other hand, trendline for the blue dots starts at the lower left quadrant to mid upper right quadrant of the plot where NBA who have played around range of 20 - 65 games and have an average of less than 30 minutes per game have less chances of making playoffs, and scoring points based on the historical data.

But there are some exceptions in the scatter plot for instance there are blue dots in the right upper quadrant of the plot where some NBA players have either played all 82 games or above 75 games, and have an average playing time of more than 30 minutes per games but still not make playoffs. So based on our analysis we can conclude that there is a neutral correlation between GamesPlayed and MinutesPerGame with a dependencies on overall Team statistics with would affect players chances of making playoffs.

Heat Analysis: On Individual NBA Player Statistics based on Average 3 Pointer Per Game

Since the 3PointerPerGame shooting category was highly correlated with overall players statistics we will use 3PointerPerGame score for our analysis to find top 20 players who have a high average in the three pointer shooting category. Based on the following heat map data analysis it would help us reveal that either our hunch is correct or incorrect regarding NBA players who have been scoring an average of high 3-pointers per game and are more likely to be in teams that make the NBA playoffs.

Let's look at top 50 individual NBA players statistics grouped in four different categories by Team, position, playoff, year, and sorted on average of 3 pointers per game

Explaination of Heat Map Analysis: Top 50 NBA Player Statistics based on Average 3 Pointer Per Game

In the top 50 NBA player statistics 5 three pointer per game is the highest average points in the 3 pointer shooting category scored by Steph Curry and James Harden also they both were in teams that mades the playoffs, and have been consistent in their performance over the years based on the data.

But if we look at the broader picture in the heat map data analysis even though there are some players that appear twice in the data above because either they played for different teams or their consistently performing well in multliple position in different year scoring a high average of three pointer per games. But the main point is there are 23 out of 50 NBA player were not teams that made the playoffs which 46% of the data which concludes that are hunch is incorrect regarding players scoring an average of high 3-pointers per game and are more likely to be in teams that make the NBA playoffs. To further prove our assumptions on this heat map analysis we will narrow the data down to top 10 NBA players based on average 3 pointers per game

Top 10 NBA Player Statistics based on Average 3 Pointer Per Game

Explaination of Heat Map Analysis: Top 10 NBA Player Statistics based on Average 3 Pointer Per Game

Now, that we have narrowed our data down to top 10 NBA players having a high average of 3 pointer per game we can see that some of the players have been consistently performing well in their individual statistics shooting like Steph Curry leading in points in the year 2018 and 2015, also being in teams that playoffs. Similarly, 'James Harden' have tied shooting average of 5 three pointers per game.

On the other hand, there are players for instance D'Angelo Russell who has been scoring consistently with 4 three pointer per game in the year '2019' for two different teams but did not make the playoff. In addition, the data reveals that 4 out of 10 players in the leading three points shooting category were not in the teams that made the playoffs which is 40% of the data above. It further proves that individual players statistics do not correlate to their chances of always making the playoffs or being in the teams that make the playoffs.

Heat Analysis: On Individual NBA Player Statistics based on Average 2 Pointer Per Game

We should now analyze 2PointerPerGame shooting category with overall players statistics to see if the data reveals similar patterns in comparison three pointer shooting category. We will use 2PointerPerGame score for our analysis to find top 50 players who have a high average in the two pointer shooting category. Based on the following heat map data analysis it would help us reveal that either our hunch is correct or incorrect regarding NBA players who have been scoring an average of high 2-pointers per game and are more likely to be in teams that make the NBA playoffs.

Let's look at top 50 individual NBA players statistics grouped in four different categories by Team, position, playoff, year, and sorted on average of 2 pointers per game

Explaination of Heat Map Analysis: Top 50 NBA Player Statistics based on Average 2 Pointer Per Game

In the top 50 NBA player statistics 10 three pointer per game is the highest average points in the 2 pointer shooting category scored by Anthony Davis and Russell Westbrook also they both were in teams that mades the playoffs, and have been consistent in their performance over the years based on the data. But at the same time Anthony Davis has also not made in the playoffs while playing for the same team in different seasons. It shows that having good performance in the 2 pointer shooting category does not depend upon guarantee playoffs spot.

But if we look at the broader picture in the heat map data analysis even though there are some players like Anthony Davis, LeBron James,and Russell Westbrook that appear twice in the data above because either they played for different teams or their consistently performing in well in multiple position in different years scoring a high average of two pointer per games. But the main point is there are 18 out of 50 NBA player were not teams that made the playoffs which 36% of the data which concludes that are hunch is incorrect regarding players scoring an average of high 2-pointers per game and are more likely to be in teams that make the NBA playoffs. To further prove our assumptions on this heat map analysis we will narrow the data down to top 10 NBA players based on average 2 pointers per game

Top 10 NBA Player Statistics based on Average 2 Pointer Per Game

Explaination of Heat Map Analysis: Top 10 NBA Player Statistics based on Average 2 Pointer Per Game

Now, that we have narrowed our data down to top 10 NBA players having a high average of 2 pointer per game we can see that some of the players have been consistently performing well in their individual statistics shooting like Anthony Davis leading in points in the year 2017 and 2016, also being in teams that playoffs. At the same time Anthony Davis have scored an average of 9 two per games in the year 2018 and 2015 but did not make playoffs with same team.

On the other hand, there are two players for instance Karl-Anthony Towns and DeMar DeRozan who has been scoring consistently with an average of 9 two pointer per game in the same year 2016-2017 are in two different team but only one of them made the playoffs. The data indicates that overall teams performance plays a vital roles in making playoffs then just individual players statistics.

In addition, the data reveals that 4 out of 10 players in the leading two points shooting category were not in the teams that made the playoffs which is 40% of the data above. It further proves that individual players statistics do not correlate to their chances of always making the playoffs or being in the teams that make the playoffs.

Heat Analysis: On Individual NBA Player Statistics based on Average Fields Goals Per Game

We should now analyze FieldGoalsPerGame shooting category with overall players statistics to see if the data reveals similar patterns in comparison two pointer shooting category. We will use FieldGoalsPerGame score for our analysis to find top 50 players who have a high average in the fields goals shooting category. Also, include both three and two pointer shots category statistics in this column based on position of an NBA player they shoot only 2 or 3 pointers. Based on the following heat map data analysis it would help us reveal that either our hunch is correct or incorrect regarding NBA players who have been scoring an average of high fields goals per game and are more likely to be in teams that make the NBA playoffs.

Let's look at top 50 individual NBA players statistics grouped in four different categories by Team, position, playoff, year, and sorted on average of field goals per game

Explaination of Heat Map Analysis: Top 50 NBA Player Statistics based on Field Goals Per Game

In the top 50 NBA player statistics 11 field goals per game is the highest average points in the field goals shooting category scored by James Harden, Russell Westbrook as they both were in same team, and Giannis Antetokounmpo was in different team that made the playoffs, and have been consistent in their performance over the years based on the data.

But if we look at the broader picture in the heat map data analysis even though there are some players like LeBron James, Kyrie Irving, and Kevin Durant that appear more than twice in the data above because either they played for different teams or their consistently performing in well in multiple position in different years scoring a high average of fields goals per game. But the main point is there are 10 out of 50 NBA player were not on teams that made the playoffs which 20% of the data.

Since, field goals column includes both players shooting 2 and 3 pointer per games it concludes that are hunch is incorrect regarding players scoring an average of high 3-pointers or 2-pointers per game and are more likely to be in teams that make the NBA playoffs because according to the field goals per games analysis above shows that players should be able to score better in both shooting category. To further prove our assumptions on this heat map analysis we will narrow the data down to top 10 NBA players based on average field goals per game.

Top 10 NBA Player Statistics based on Average Field Goals Per Game

Explaination of Heat Map Analysis: Top 10 NBA Player Statistics based on Average Field Goals Per Game

Now, that we have narrowed our data down to top 10 NBA players having a high average of field goals per game we can see that some of the players have been consistently performing well in their individual statistics shooting like Giannis Antetokounmpo leading in points in the year 2019 and 2018, also being in same teams that made playoffs. Similarly, James Harden and Russell Westbrookhave tied shooting average of 11 fields goals per game on same team.

On the other hand, there are players for instance Anthony Davis who has been scoring consistently with 10 field goals per game in the year '2016' and '2017' with same team but made the playoffs only once. In addition, the data reveals that 1 out of 10 NBA players in the above heat map analysis were not on teams that made the playoffs which is 10% of the data above. It further proves that individual players statistics in the fields goals per game category do correlate to their chances of always making the playoffs or being in the teams that make the playoffs.

Heat Analysis: On Individual NBA Player Statistics based on Average Total Points Per Game

We should now analyze TotalPointsPerGame shooting category with overall players statistics to see if the data reveals similar patterns in comparison field goals category. We will use TotalPointsPerGame score for our analysis to find top 50 players who have a high average in the total points per game category. Based on the following heat map data analysis it would help us reveal that either our hunch is correct or incorrect regarding NBA players who have been scoring an average of high total points per game and are more likely to be in teams that make the NBA playoffs.

Let's look at top 50 individual NBA players statistics grouped in four different categories by Team, position, playoff, year, and sorted on average of total points per game

Explaination of Heat Map Analysis: Top 50 NBA Player Statistics based on Total Points Per Game

In the top 50 NBA player statistics 36 total points per game is the highest average in the total points category scored by James Harden, and Russell Westbrook scored an average of 32 total points per game as they both were in different teams that made the playoffs, and have been consistent in their performance over the years based on the data. But at the same time Bradley Beal amd Trae Young were also top 4 leading scorers in the total points category in 2019-2020 season but their teams did not make the playoffs. It shows that having good performance in the total points per game category does not guarantee playoffs spot.

But if we look at the broader picture in the heat map data analysis even though there are some players like LeBron James, Anthony Davis, Bradley Bealand Kevin Durant that appear more than twice in the data above because either they played for different teams or their consistently performing in well in multiple position in different years scoring a high average of total points per games. But the main point is there are 16 out of 50 NBA player were not in teams that made the playoffs which 32% of the data. It concludes that are hunch is correct regarding the total points per games analysis above shows that players should be able to score better in both shooting category. To further prove our assumptions on this heat map analysis we will narrow the data down to top 10 NBA players based on average total points per game

Top 10 NBA Player Statistics based on Average Total Points Goals Per Game

Explaination of Heat Map Analysis: Top 10 NBA Player Statistics based on Average Total Points Per Game

Now, that we have narrowed our data down to top 10 NBA players having a high average of total points per game we can see that some of the players have been consistently performing well in their individual statistics like James Harden leading in points in the year 2019 and 2018, also being in same teams that made playoffs. Similarly, Russell Westbrookhave scored an average of 32 total points per game being in different team that made playoffs.

On the other hand, there are players like Bradley Beal who have scored 31 total points per game and Trae Young have scored 30 total points per game in the same regular season 2019-2020 with different team but did not make the playoffs. In addition, the data reveals that 2 out of 10 players from the above data were not on teams that made the playoffs which is 20% of the data above. It further proves that individual players statistics in the total points per game category do correlate to their chances of always making the playoffs but it is also dependent on the overall team statistics.

Project Findings

Based on the visualization data revealed that scoring a high average in shooting category does not necessarily guarantee a spot in the playoffs there are other categories in the statistics that also play vital role to achieve that goal. For instance, hypothesis 1 was partially supported as shown in the swarm plot and pie chart that some position like Shooting Guard (SG) and Center (C) have scored higher three and two pointer per game which was correct based on our hunch. But the other NBA players positions have been fluctuating because players might be trying to learn how to improve in both three and two pointer shooting category over the years which led to us seeing tie and close percentages

The hypothesis 2 was partially supported NBA players who have played around range of 80 games and have an average of greater than 30 minutes per game have more chances of making playoffs, and scoring more points. The overall analysis shows light position correlation between GamesPlayed and MinutesPerGame column as they have been categorized based on playoffs. Assumptions made in hypothesis 3 were partially supported as well because the data revealed that 4 out of 10 players in leading average three and two pointer shooting category were not in the teams that made the playoffs which is 40% of the data. It further proves that based on the individual players statistics shows neutral correlation with shooting category to their chances of making the playoffs because its dependent on the overall team’s statistics data.

I did not encounter difficulties with the data, but it would be interesting to gather overall team’s statistics data from 2015 to 2020 and merge with current dataset of player’s playoffs statistics data so that we can do analysis on other various categories which would lead to seeing better correlations between team and players columns. Also, how it affects their chances of making playoffs that would possibly improve the outcomes of the analysis. To take it a step further I would look into analyzing the dynamics of each NBA players positions as well because the players are improving in both three and two pointer shooting category over the years.